PSC 103B
We can access this dataset by installing the palmerspenguins package.
Rows: 344
Columns: 8
$ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
$ island <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
$ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
$ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
$ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
$ body_mass_g <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
$ sex <fct> male, female, female, NA, female, male, female, male…
$ year <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…
Outcome variable: bill_length_mm
Not all penguins gave data on bill length and there are some missing values.
The complete.cases() function gives the row numbers where there is non-missing values on the variable you give it.
Suppose we were interested in whether male penguins or female penguins had different bill lengths.
We suspected that male penguins have longer bill lengths than female penguins.
Let’s look at both means
Another way to do this is to use the tapply() function.
tapply(variable, group, function, extra arguments for the function)
Is the numerical difference of ~4 mm actually significant?
\(H_0: \mu_{female} = \mu_{male}\), or the average bill length of females is the same as the average bill length of males.
\(H_1: \mu_{female} < \mu_{male}\), or the average bill length of females is less than that of males.
The t-test is trying to see whether the difference you observed between the groups is large given the expected variability of that difference across samples.
Our hypothesis was that females have shorter bill lengths than males.
R views the females as Group 1 and males as Group 2 (because female is alphabetically before male). We need to decide our alternative with Group 1 compared to Group 2.
Using the syntax in the next slide, replace the placeholders with the name of the variables we’re interested in.
Tip
The argument alternative specify the alternative hypothesis and can take any of these three values: "two.sided", "less", or "greater". Think about our hypothesis to choose one the alternatives.
Welch Two Sample t-test
data: bill_length_mm by sex
t = -6.6725, df = 329.29, p-value = 5.332e-11
alternative hypothesis: true difference in means between group female and group male is less than 0
95 percent confidence interval:
-Inf -2.82883
sample estimates:
mean in group female mean in group male
42.09697 45.85476
The Welch Two Sample t-test found that female penguins (M = 42.1, SD = 4.90) have, on average, shorter bill lenghts than male penguins (M = 45.9, SD = 5.37), t(329.29) = -6.67, p < .001.
Notice that R gives us by default the Welch’s t-test.
It is used when the number of samples in each group is different, and the variance of the two data sets is also different. Usually that is a safe assumption.
If you want to assume equal variances, set the argument var.equal = TRUE.
PSC 103B - Statistical Analysis of Psychological Data